Self-Supervision for Reinforcement Learning
نویسندگان
چکیده
Reinforcement learning optimizes policies for expected cumulative reward. Need the supervision be so narrow? Reward is delayed and sparse for many tasks, making it a difficult and impoverished signal for end-to-end optimization. To augment reward, we consider a range of self-supervised tasks that incorporate states, actions, and successors to provide auxiliary losses. These losses offer ubiquitous and instantaneous supervision for representation learning even in the absence of reward. While current results show that learning from reward alone is feasible, pure reinforcement learning methods are constrained by computational and data efficiency issues that can be remedied by auxiliary losses. Self-supervised pretraining and joint optimization improve the data efficiency and policy returns of end-to-end reinforcement learning.
منابع مشابه
Loss is its own Reward: Self-Supervision for Reinforcement Learning
Reinforcement learning optimizes policies for expected cumulative reward. Need the supervision be so narrow? Reward is delayed and sparse for many tasks, making it a difficult and impoverished signal for end-to-end optimization. To augment reward, we consider a range of selfsupervised tasks that incorporate states, actions, and successors to provide auxiliary losses. These losses offer ubiquito...
متن کاملSUPERVISION OF MIDWIVES AT THE UNIVERSITY OF HERTFORDSHIRE Supervision of Midwives is a statutory responsibility which provides a “mechanism for support and guidance to every midwife practising in the United Kingdom” (Nursing and Midwifery Council, NMC
Supervision of Midwives is a statutory responsibility which provides a “mechanism for support and guidance to every midwife practising in the United Kingdom” (Nursing and Midwifery Council, NMC and The Local Supervising Authority Midwifery Officers National Forum, LSAMONF, 2008). To become a SoM requires a midwife to be nominated by her peers and to undertake a course at Masters Level over two ...
متن کاملEfficient Multi-Agent Reinforcement Learning through Automated Supervision (Short Paper)
Multi-Agent Reinforcement Learning (MARL) algorithms suffer from slow convergence and even divergence, especially in large-scale systems. In this work, we develop a supervision framework to speed up the convergence of MARL algorithms in a network of agents. The framework defines an organizational structure for automated supervision and a communication protocol for exchanging information between...
متن کاملEfficient multi-agent reinforcement learning through automated supervision
Multi-Agent Reinforcement Learning (MARL) algorithms suffer from slow convergence and even divergence, especially in large-scale systems. In this work, we develop a supervision framework to speed up the convergence of MARL algorithms in a network of agents. The framework defines an organizational structure for automated supervision and a communication protocol for exchanging information between...
متن کاملA Comparative Study of Self-Supervision and the Self-Efficacy of Iranian EFL Teachers and Those of Intermediate Adult Learners
The present study was conducted to examine the relationship between the self-supervision and the self-efficacy of Iranian EFL teachers and also the relationship between the self-supervision and the self-efficacy of intermediate adult learners individually. To this end, 40 EFL teachers and 55 intermediate adult learners were selected from two branches of Kish Language Institute. In this study, “...
متن کامل